AAAI.2019 - Humans and AI

Total: 9

#1 Deep Transformation Method for Discriminant Analysis of Multi-Channel Resting State fMRI [PDF] [Copy] [Kimi]

Authors: Abhay M S Aradhya ; Aditya Joglekar ; Sundaram Suresh ; M. Pratama

Analysis of resting state - functional Magnetic Resonance Imaging (rs-fMRI) data has been a challenging problem due to a high homogeneity, large intra-class variability, limited samples and difference in acquisition technologies/techniques. These issues are predominant in the case of Attention Deficit Hyperactivity Disorder (ADHD). In this paper, we propose a new Deep Transformation Method (DTM) that extracts the discriminant latent feature space from rsfMRI and projects it in the subsequent layer for classification of rs-fMRI data. The hidden transformation layer in DTM projects the original rs-fMRI data into a new space using the learning policy and extracts the spatio-temporal correlations of the functional activities as a latent feature space. The subsequent convolution and decision layers transform the latent feature space into high-level features and provide accurate classification. The performance of DTM has been evaluated using the ADHD200 rs-fMRI benchmark data with crossvalidation. The results show that the proposed DTM achieves a mean classification accuracy of 70.36% and an improvement of 8.25% on the state of the art methodologies was observed. The improvement is due to concurrent analysis of the spatio-temporal correlations between the different regions of the brain and can be easily extended to study other cognitive disorders using rs-fMRI. Further, brain network analysis has been studied to identify the difference in functional activities and the corresponding regions behind cognitive symptoms in ADHD.

#2 AI-Sketcher : A Deep Generative Model for Producing High-Quality Sketches [PDF] [Copy] [Kimi]

Authors: Nan Cao ; Xin Yan ; Yang Shi ; Chaoran Chen

Sketch drawings play an important role in assisting humans in communication and creative design since ancient period. This situation has motivated the development of artificial intelligence (AI) techniques for automatically generating sketches based on user input. Sketch-RNN, a sequence-to-sequence variational autoencoder (VAE) model, was developed for this purpose and known as a state-of-the-art technique. However, it suffers from limitations, including the generation of lowquality results and its incapability to support multi-class generations. To address these issues, we introduced AI-Sketcher, a deep generative model for generating high-quality multiclass sketches. Our model improves drawing quality by employing a CNN-based autoencoder to capture the positional information of each stroke at the pixel level. It also introduces an influence layer to more precisely guide the generation of each stroke by directly referring to the training data. To support multi-class sketch generation, we provided a conditional vector that can help differentiate sketches under various classes. The proposed technique was evaluated based on two large-scale sketch datasets, and results demonstrated its power in generating high-quality sketches.

#3 Election with Bribed Voter Uncertainty: Hardness and Approximation Algorithm [PDF] [Copy] [Kimi]

Authors: Lin Chen ; Lei Xu ; Shouhuai Xu ; Zhimin Gao ; Weidong Shi

Bribery in election (or computational social choice in general) is an important problem that has received a considerable amount of attention. In the classic bribery problem, the briber (or attacker) bribes some voters in attempting to make the briber’s designated candidate win an election. In this paper, we introduce a novel variant of the bribery problem, “Election with Bribed Voter Uncertainty” or BVU for short, accommodating the uncertainty that the vote of a bribed voter may or may not be counted. This uncertainty occurs either because a bribed voter may not cast its vote in fear of being caught, or because a bribed voter is indeed caught and therefore its vote is discarded. As a first step towards ultimately understanding and addressing this important problem, we show that it does not admit any multiplicative O(1)-approximation algorithm modulo standard complexity assumptions. We further show that there is an approximation algorithm that returns a solution with an additive-ε error in FPT time for any fixed ε.

#4 Human Motion Prediction via Learning Local Structure Representations and Temporal Dependencies [PDF] [Copy] [Kimi]

Authors: Xiao Guo ; Jongmoo Choi

Human motion prediction from motion capture data is a classical problem in the computer vision, and conventional methods take the holistic human body as input. These methods ignore the fact that, in various human activities, different body components (limbs and the torso) have distinctive characteristics in terms of the moving pattern. In this paper, we argue local representations on different body components should be learned separately and, based on such idea, propose a network, Skeleton Network (SkelNet), for long-term human motion prediction. Specifically, at each time-step, local structure representations of input (human body) are obtained via SkelNet’s branches of component-specific layers, then the shared layer uses local spatial representations to predict the future human pose. Our SkelNet is the first to use local structure representations for predicting the human motion. Then, for short-term human motion prediction, we propose the second network, named as Skeleton Temporal Network (Skel-TNet). Skel-TNet consists of three components: SkelNet and a Recurrent Neural Network, they have advantages in learning spatial and temporal dependencies for predicting human motion, respectively; a feed-forward network that outputs the final estimation. Our methods achieve promising results on the Human3.6M dataset and the CMU motion capture dataset, and the code is publicly available 1.

#5 Lipper: Synthesizing Thy Speech Using Multi-View Lipreading [PDF] [Copy] [Kimi]

Authors: Yaman Kumar ; Rohit Jain ; Khwaja Mohd. Salik ; Rajiv Ratn Shah ; Yifang Yin ; Roger Zimmermann

Lipreading has a lot of potential applications such as in the domain of surveillance and video conferencing. Despite this, most of the work in building lipreading systems has been limited to classifying silent videos into classes representing text phrases. However, there are multiple problems associated with making lipreading a text-based classification task like its dependence on a particular language and vocabulary mapping. Thus, in this paper we propose a multi-view lipreading to audio system, namely Lipper, which models it as a regression task. The model takes silent videos as input and produces speech as the output. With multi-view silent videos, we observe an improvement over single-view speech reconstruction results. We show this by presenting an exhaustive set of experiments for speaker-dependent, out-of-vocabulary and speaker-independent settings. Further, we compare the delay values of Lipper with other speechreading systems in order to show the real-time nature of audio produced. We also perform a user study for the audios produced in order to understand the level of comprehensibility of audios produced using Lipper.

#6 Goal-Oriented Dialogue Policy Learning from Failures [PDF] [Copy] [Kimi]

Authors: Keting Lu ; Shiqi Zhang ; Xiaoping Chen

Reinforcement learning methods have been used for learning dialogue policies. However, learning an effective dialogue policy frequently requires prohibitively many conversations. This is partly because of the sparse rewards in dialogues, and the very few successful dialogues in early learning phase. Hindsight experience replay (HER) enables learning from failures, but the vanilla HER is inapplicable to dialogue learning due to the implicit goals. In this work, we develop two complex HER methods providing different tradeoffs between complexity and performance, and, for the first time, enabled HER-based dialogue policy learning. Experiments using a realistic user simulator show that our HER methods perform better than existing experience replay methods (as applied to deep Q-networks) in learning rate.

#7 Be Inaccurate but Don’t Be Indecisive: How Error Distribution Can Affect User Experience [PDF] [Copy] [Kimi]

Authors: Rafael R. Padovani ; Lucas N. Ferreira ; Levi H. S. Lelis

System accuracy is a crucial factor influencing user experience in intelligent interactive systems. Although accuracy is known to be important, little is known about the role of the system’s error distribution in user experience. In this paper we study, in the context of background music selection for tabletop games, how the error distribution of an intelligent system affects the user’s perceived experience. In particular, we show that supervised learning algorithms that solely optimize for prediction accuracy can make the system “indecisive”. That is, it can make the system’s errors sparsely distributed throughout the game session. We hypothesize that sparsely distributed errors can harm the users’ perceived experience and it is preferable to use a model that is somewhat inaccurate but decisive, than a model that is accurate but often indecisive. In order to test our hypothesis we introduce an ensemble approach with a restrictive voting rule that instead of erring sparsely through time, it errs consistently for a period of time. A user study in which people watched videos of Dungeons and Dragons sessions supports our hypothesis.

#8 Consensual Affine Transformations for Partial Valuation Aggregation [PDF] [Copy] [Kimi]

Authors: Hermann Schichl ; Meinolf Sellmann

We consider the task of aggregating scores provided by experts that each have scored only a subset of all objects to be rated. Since experts only see a subset of all objects, they lack global information on the overall quality of all objects, as well as the global range in quality. Inherently, the only reliable information we get from experts is therefore the relative scores over the objects that they have scored each. We propose several variants of a new aggregation framework that takes this into account by computing consensual affine transformations of each expert’s scores to reach a globally balanced view. Numerical comparisons with other aggregation methods, such as rank-based methods, Kemeny-Young scoring, and a maximum likelihood estimator, show that the new method gives significantly better results in practice. Moreover, the computation is practically affordable and scales well even to larger numbers of experts and objects.

#9 CycleEmotionGAN: Emotional Semantic Consistency Preserved CycleGAN for Adapting Image Emotions [PDF] [Copy] [Kimi]

Authors: Sicheng Zhao ; Chuang Lin ; Pengfei Xu ; Sendong Zhao ; Yuchen Guo ; Ravi Krishna ; Guiguang Ding ; Kurt Keutzer

Deep neural networks excel at learning from large-scale labeled training data, but cannot well generalize the learned knowledge to new domains or datasets. Domain adaptation studies how to transfer models trained on one labeled source domain to another sparsely labeled or unlabeled target domain. In this paper, we investigate the unsupervised domain adaptation (UDA) problem in image emotion classification. Specifically, we develop a novel cycle-consistent adversarial model, termed CycleEmotionGAN, by enforcing emotional semantic consistency while adapting images cycleconsistently. By alternately optimizing the CycleGAN loss, the emotional semantic consistency loss, and the target classification loss, CycleEmotionGAN can adapt source domain images to have similar distributions to the target domain without using aligned image pairs. Simultaneously, the annotation information of the source images is preserved. Extensive experiments are conducted on the ArtPhoto and FI datasets, and the results demonstrate that CycleEmotionGAN significantly outperforms the state-of-the-art UDA approaches.